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Abstract 

A  condensed  history  and  theoretical  development  of  lattice-gas  au¬ 
tomata  in  the  Boltzmann  limit  is  presented.  This  is  provided  as  back¬ 
ground  to  set  up  the  context  for  understanding  the  implementation  of 
the  lattice-gas  method  on  two  parallel  supercomputers:  the  MIT  cellu¬ 
lar  automata  machine  CAM-8  and  the  Connection  Machine  CM-5.  The 
macroscopic  limit  of  two-dimensional  fluids  is  tested  by  simulating  the 
Rayleigh-Benard  convective  instability,  Kelvin-Helmholtz  shear  instabil¬ 
ity,  and  the  Von  Karman  vortex  shedding  instability.  Performance  of  the 
two  machines  in  terms  of  both  site  update  rate  and  maximum  problem 
size  are  comparable.  The  CAM-8,  being  a  low-cost  desktop  machine, 
demonstrates  the  potential  of  special-purpose  digital  hardware. 

Contents 

1  Introduction  3 

2  Some  Historical  Notes  5 

3  Why  lattice-gases?  8 

4  Lattice-Gas  Automata  9 

4.1  Some  Preliminaries  about  the  Local  Dyanamics .  10 

4.2  Coarse-Grained  Dynamics .  11 

4.3  Triangular  Lattice:  B=6 .  12 

4.4  Single  Particle  Multispeed  Fermi-Dirac  Distribution  Function  .  .  14 

4.5  Hexagonal  Lattice  .  17 


1 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

23  NOV  1993 


2.  REPORT  TYPE 


3.  DATES  COVERED 


4.  TITLE  AND  SUBTITLE 

Lattice-Gas  Automata  Fluids  on  Parallel  Supercomputers 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


6.  AUTHOR(S) 


5c.  PROGRAM  ELEMENT  NUMBER 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 
5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES)  8.  PERFORMING  ORGANIZATION 

Phillips  Laboratory ,Hanscom  Field, MA, 01731  report  number 

9.  SPONSORING/MONITORING  AGENCY  NAME(S )  AND  ADDRESS(ES )  10.  SPONSOR/MONITOR' S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

The  original  document  contains  color  images. 

14.  ABSTRACT 

A  condensed  history  and  theoretical  development  of  lattice-gas  automata  in  the  Boltzmann  limit  is 
presented.  This  is  provided  as  background  to  set  up  the  context  for  understanding  the  implementation  of 
the  lattice-gas  method  on  two  parallel  supercomputers:  the  MIT  cellular  automata  machine  CAM-8  and 
the  Connection  Machine  CM-5.  The  macroscopic  limit  of  two-dimensional  fluids  is  tested  by  simulating  the 
Rayleigh-B’enard  convective  instability,  Kelvin-Helmholtz  shear  instability,  and  the  Von  Karman  vortex 
shedding  instability.  Performance  of  the  two  machines  in  terms  of  both  site  update  rate  and  maximum 
problem  size  are  comparable.  The  CAM-8,  being  a  low-cost  desktop  machine,  demonstrates  the  potential  of 
special-purpose  digital  hardware. 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 

18.  NUMBER 

ABSTRACT 

OF  PAGES 

43 

19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


5  The  Cellular  Automata  Machine  CAM-8  19 

6  The  Connection  Machine  CM-5  20 

7  Gallery  of  Computational  Results  23 

7.1  Rayleigh-Benard  Convection  on  the  CAM-8 .  23 

7.2  Kelvin-Helmholtz  Instability  on  the  CAM-8 .  24 

7.3  Von  Karman  Streets  on  the  CM-5  .  25 

8  Discussion  25 

9  Acknowledgements  27 

List  of  Figures 

1  Illustration  on  a  hexagonal  lattice  of  the  two-step  collision  and  streaming 

process  required  to  complete  a  single  time  step:  (a)  initial  configuration,  (b) 
collision  by  permutation,  and  (c)  streaming  of  particles .  38 

2  (a)  Lattice  vector  label  convention;  (b)  Hexagonal  lattice  convention  with 

lattice  directions  a  =  3  up  and  a  =  6  down.  Coordinates  above  the  lattice 
nodes  are  (i,j)  memory  array  indices .  38 

3  MIT  Laboratory  for  Computer  Science  cellular  automata  machine  CAM-8. 

This  8  module  prototype  can  evolve  a  D-dimensional  cellular  space  with  32 
million  sites  where  each  site  has  16  bits  of  data  with  a  site  update  rate  of 

200  million  per  second .  39 

4  (a)  A  single  processing  node,  with  DRAM  site  data  flowing  through  an  SRAM 

lookup  table  and  back  into  DRAM,  (b)  Spatial  array  of  CAM-8  nodes,  with 
nearest-neighbor  (mesh)  interconnect  (1  wire/bit-slice  in  each  direction).  .  .  39 

5  CM-5  Node:  SPARCCPU,  32  Mbytes  of  memory  and  4  Vector  processing 

units .  40 

6  Node  Memory  Layout .  40 

7  Thermo  13-bit  CAM-8  Experiment:  Rayleigh-Benard  convection  cells  at  the 
critical  Rayleigh  number.  Lattice  Size:  2048  x  1024.  Time  Average:  100. 

Spatial  Average:  64  x  64.  Mass  Density  Fraction=l/5.  Data  presented  at 
50,000  time  steps .  41 

8  Momentum  and  vorticity  map  of  two-dimensional  shear  instability  on  the 
CAM-8.  Lattice  size  of  4096  X  2048  with  toroidal  boundary  conditions. 
Spacetime  averaging  over  128x128  blocks  for  50  time  steps.  FHP  collisions 
with  spectators  and  a  rest  particle.  Data  presented  at  time  steps  0,  10000, 

and  30000  with  Galilean  velocity  shift .  42 

9  FHP-II  CM-5  Experiment:  Von  Karman  Streets  Lattice  Size:  4096  X  2048. 

Time  Average:  None.  Spatial  Average:  128x128.  Mass  Density  Fractional /7. 

Data  presented  at  32,000  time  steps .  43 

10  Performance  runs  on  a  256- node  CM-5  for  an  FHP  hexagonal  lattice  em¬ 

bedded  into  a  3D  mesh.  Performance  significantly  suffers  by  communication 
overhead  for  small  lattice  sizes .  43 


2 


1  Introduction 


Lattice-gas  automata  dynamics  is  a  discrete  form  of  molecular  dynamics.  In 
molecular  dynamics  one  simulates  a  many-body  system  of  particles  with  con¬ 
tinuous  interaction  potentials  [1].  The  particles  have  continuous  positions  and 
momenta.  In  lattice-gas  dynamics  the  particles’  positions  and  momenta  are 
discrete  and  motion  is  constrained  to  a  spacetime  crystallographic  lattice. 

One  may  also  view  lattice-gas  automata  dynamics  as  an  extension  of  cellular 
automata,  popularized  in  the  physics  community  by  Stephen  Wolfram  [2,  3] .  An 
elementary  treatment  of  the  cellular  automata  subject  is  presented  by  Tommaso 
Toffoli  and  Norman  Margolus  in  their  book  on  cellular  automata  machines  [4] . 
Corresponding  to  the  cellular  automata  paradigm,  lattice-gas  automata  are  ide¬ 
ally  suited  for  massively  parallel  processing.  In  lattice-gas  automata  models, 
each  possible  momentum  state  at  a  given  position  is  represented  by  a  single 
digital  bit.  Therefore,  a  Pauli  exclusion  principle  is  enforced  where  there  can 
be  no  more  than  a  single  particle  per  momentum  state.  As  a  particle  in  state 
a  at  some  lattice  site  of  the  crystallographic  space  “hops”  into  state  (3,  say  at 
a  neighboring  site,  a  digital  bit  is  moved  from  a  and  into  (3.  So  in  lattice-gas 
dynamics  one  simulates  a  system  of  Boolean  particles  where  the  data  streaming 
corresponds  to  spatial  translation  and  the  data  permutation  corresponds  to  col- 
lisional  interactions.  This  computational  analog  of  particle  dynamics  offers  an 
exciting  alternative  to,  and  not  simply  an  approximation  of,  the  usual  partial 
differential  equations  [5].  So  the  lattice-gas  methodology  has  an  intrinsic  value 
beyond  finite  difference  schemes. 

The  lattice  gas  approach  has  been  extended  to  a  lattice-based  Krook-Bhatnager- 
Gross  approximation  of  the  Boltzmann  equation  [6,  7,  8,  9],  which  we  call  the 
lattice-Boltzmann  equation.  In  place  of  the  exactly  computable  dynamics  of 
Boolean  particles,  one  focuses  on  a  statistical  regime  where  a  particle  has  a 
probability  of  occupying  a  given  momentum  state.  Moreover,  many  particles 
can  occupy  the  same  momentum  state  at  the  same  position.  The  approach  of¬ 
fers  both  theoretical  and  computational  advantages.  An  important  theoretical 
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advantage  is  that  one  may  capture  the  essential  physics  of  the  complex  system  by 
stating  no  more  than  the  system’s  equilibrium  distribution.  Computationally  it 
has  the  advantage  of  reducing  noisy  fluctuations  in  the  system  at  the  expense  of 
discarding  information  concerning  particle-particle  correlations.  Lattice-gas  au¬ 
tomata  in  the  Boltzmann  approximation  have  become  one  of  the  most  important 
contributions  by  the  lattice-gas  community  to  high-performance  computational 
physics  [10].  The  lattice-Boltzmann  equation,  usually  implemented  within  one 
hundred  lines  of  code  on  a  massively  parallel  processor  1,  allows  the  researcher 
to  efficiently  model  complex  systems,  providing  a  straightforward  particle-based 
metaphor  to  computation.  Yet  it  relies  on  expensive  floating  point  calculations 
and  therefore  is  most  suited  for  massively  parallel  machines  such  as  the  Connec¬ 
tion  Machine-5  (CM-5).  In  this  paper,  we  use  the  lattice-Boltzmann  equation 
only  for  theoretical  analysis — all  of  our  simulations  are  based  on  lattice  gases. 

We  have  implemented  a  general  lattice-gas  automaton  on  two  parallel  archi¬ 
tectures:  the  experimental  MIT  CAM-8  and  the  Thinking  Machines  Corporation 
CM-5.  Very  briefly,  a  CAM-8  node  has  222  16-bit  sites  (8  Mbytes  DRAM)  and 
a  double  buffered  look-up  table  (2  Mbits  SRAM),  with  a  clock  speed  of  25  MHz. 
There  are  no  processors,  only  lookup  tables.  The  nodes  are  connected  in  a  3D 
mesh.  A  CM-5  node  has  32  Mbytes  of  DRAM,  four  vector  units  at  16  MHz,  and 
a  SPARC  processor  at  32  MHz.  A  fat-tree  network  connects  the  nodes.  The 
architectures  of  both  machines  will  be  discussed  in  more  detail  below. 

The  presentation  here  is  limited  to  two-dimensional  fluids.  Three-dimensional 
hydrodynamics,  immiscible  fluids  [11,  12, 13,  14],  multiphase  systems  [15,  16,  17, 
18],  reaction-diffusion  systems,  magnetohydrodynamics  [19],  and  flow  through 
porous  media  [20,  21]  are  also  subjects  of  active  research.  We  have  implemented 
hydrodynamic  and  thermohydrodynamic  lattice-gases  on  the  CAM-8,  the  first 
lattice-gas  experiment  conducted  on  the  prototype  machine.  Results  of  the 
CAM-8  and  CM-5  simulations  of  the  Rayleigh-Benard  convection  instability, 
Kelvin-Helmholtz  shear  instability,  and  Von-Karman  vortex  shedding  instabil¬ 
ity  are  presented. 

Our  main  findings  are  the  following.  The  CAM-8  delivers  25  million  site 
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updates  per  second  per  module  for  a  16  bit  lattice  gas.  The  CM-5  in  principle 
can  simulate  larger  lattices  due  to  its  much  larger  memory  size  (16Gbytes  on  a 
512-node  partition).  In  practice  the  typical  problem  sizes  were  almost  identi¬ 
cal  on  a  128-node  partition  of  the  CM-5  and  on  an  8-module  CAM-8  prototype: 
4096  x  2048  2D  lattices  on  both  machines.  The  CM-5  can  deliver  about  1  million 
site  updates  per  second  per  node  for  a  16  bit  lattice  gas.  For  the  CM-5  inter¬ 
processor  communication  cost  is  small  relative  to  the  computation  involved  in 
the  data  streaming  and  site  update  when  the  size  of  the  lattice  is  large  enough. 
This  suggests  that  the  relatively  low  communication  bandwidth  imposes  no  se¬ 
rious  degradation  on  the  delivered  performance.  We  also  find  that  the  delivered 
performance  increases  as  the  lattice  size  increases,  see  appendix  9. 

The  best  way  to  report  performance  of  each  machine  for  lattice-gas  simula¬ 
tion  is  to  give  the  site  update  rate.  The  best  rate  achieved  on  the  thermohy- 
clrodynamic  lattice-gas  was  191  million  site  updates  per  second  on  an  8-module 
CAM-8  and  110  million  on  on  a  256-node  partition  of  the  CM-5.  For  a  simpler 
problem,  the  FHP  lattice-gas,  a  256-node  partition  of  the  CM-5  attained  a  site 
update  rate  of  550  million  updates  per  second  on  a  32K  by  2K  lattice.  The  best 
rate  obtainable  on  the  8-module  CAM-8  for  the  same  simple  problem  is  382  mil¬ 
lion  site  updates  per  second.  The  CAM-8  prototype  is  a  desktop  machine  (see 
figure  3)  containing  approximately  the  same  amount  of  digital  logic  and  mem¬ 
ory  as  is  found  in  a  common  workstation.  The  efficiency  of  the  CAM-8  comes 
at  a  severe  price,  in  terms  of  its  specialization  to  a  certain  class  of  scientific 
computations,  albeit  its  applicability  is  widening  [22]. 

2  Some  Historical  Notes 

Let  us  briefly  review  some  of  the  historical  developments  of  the  lattice-gas  sub¬ 
ject.  An  overview  of  the  lattice-gas  subject  has  also  been  given  by  Boghosian 
[23].  Simple  implementations  of  a  discrete  molecular  dynamics  on  a  square 
lattice  were  investigated  in  the  early  1970’s  by  the  French,  in  particular  Yves 
Pomeau  and  coworkers  [24].  By  the  late  1970’s,  cellular  automata  research  was 
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underway  at  the  Information  Mechanics  Group  at  MIT  on  reversible  computa¬ 
tion  by  Edward  Fredkin,  Tommaso  Toffoli,  and  Norman  Margolus[25,  26,  27]. 
The  idea  of  building  special-purpose  machines  to  simulate  physics-like  models  on 
a  fine-grained  space  [28,  27]  originated  there  and  today  still  remains  a  strength 
of  that  group.  A  good  review  of  the  kind  of  cellular  automata  modeling  done 
in  the  early  1980’s  is  given  by  Gerard  Vichniac  [29].  During  this  time,  Stephen 
Wolfram  visited  the  Information  Mechanics  Group  and  was  stimulated  by  their 
work.  In  1983  Wolfram  popularized  cellular  automata  as  a  simple  mathematical 
model  to  investigate  self-organization  in  statistical  mechanics [2,  30]. 

After  visiting  the  MIT  Information  Mechanics  Group  in  1983  and  seeing 
a  TM-gas  simulation  on  the  CAM-5  machine  of  Toffoli  and  Margolus[5,  28], 
Pomeau  realized  the  potential  for  simulating  large  fluid  systems  and  much  new 
interest  and  activity  in  the  field  emerged.  A  race  began  to  theoretically  prove 
that  a  hydrodynamic  limit  emerges  from  simple  lattice-gas  automata.  The  in¬ 
tense  interest  was  not  stirred  as  much  by  the  subject  of  hydrodynamics  itself, 
but  instead  by  the  possibility  of  a  simple  cellular  space-time  model  capturing 
such  complex  natural  behavior  in  an  exact  way.  In  1985  Wolfram  completed  the 
first  hydrodynamics  simulations  on  a  triangular  lattice  [31]  on  the  Connection 
Machine — at  this  time,  lattice-gases  were  one  of  the  most  important  applications 
for  the  bit  oriented  single  instruction  multiple  data  Connection  Machine.  By 
1986  Frisch,  Hasslacher,  and  Pomeau  had  reported  the  existence  of  an  isotropic 
two-dimensional  lattice-gas  on  the  triangular  lattice  [32] .  In  the  same  year  Wol¬ 
fram  completed  the  most  detailed  treatment  of  the  basic  theory  including  novel 
symmetry  considerations  and  introduced  the  Boltzmann  approximation.  Frisch 
et  al.  found  the  minimal  lattice  symmetry  needed  to  recover  isotropic  flow  in 
the  continuum  limit  is  a  triangular  lattice  with  a  particle  possessing  six  mo¬ 
mentum  states.  Their  model  is  now  referred  to  as  the  FHP-model  or  hexagonal 
lattice-gas  model.  Accompanying  the  seminal  1986  FHP  paper  was  a  paper  by 
Margolus,  Toffoli,  and  Vichniac  on  cellular-automata  supercomputers  for  fluid- 
dynamics  modeling  [33] .  The  contribution  of  Margolus  et  al.  complements  the 
theoretical  work  of  Frisch  et  al .,  pointing  out  that  with  dedicated  computa- 
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tional  hardware  the  lattice-gas  model  potentially  gains  a  unique  advantage  over 
traditional  methods  of  physical  modeling. 

By  1987  the  lattice-gas  methodology  was  extended  to  model  three-dimensional 
flows.  The  minimal  lattice  found  by  Frisch  et  el.  [34]  was  the  face  centered  hy- 
percubic  (fchc)  lattice.  The  fchc  lattice  with  24-nearest  neighbors  is  projected 
onto  three  dimensions  in  a  simple  fashion  by  limiting  the  depth  of  the  fourth 
dimension  of  the  simulation  volume  to  one  lattice  link.  Research  is  still  un¬ 
derway  on  finding  optimal  collisions  to  minimize  the  viscosity  of  the  fluid  [35], 
however  this  task  has  proven  very  difficult.  The  reason  for  this  difficulty  is  that 
the  fchc  lattice-gas  has  224  or  16.7  million  input  configurations.  In  practice,  all 
possible  collisions  are  not  included  in  a  simulation  because  of  the  large  demand 
for  local  memory  needed  to  pre-store  all  the  necessary  collisional  events  in  table 
look-up  format — an  efficient  format  for  implementing  complex  interactions.  To 
ease  memory  loads,  lattice  isometries  are  exploited  to  reduce  the  size  of  look-up 
tables  [36]. 

The  hope  of  modeling  very  high  Reynolds  number  flows  by  lattice-gas  au¬ 
tomata  methods  has  not  yet  been  realized  with  models  that  do  not  violate 
semi-detailed  balance.  However,  lattice-gas  models  in  the  Boltzmann  approxi¬ 
mation  have  shown  considerably  more  success  in  achieving  high  Reynolds  num¬ 
ber  flows2. 

Recently  the  first  prototype  of  the  next  generation  cellular  automata  ma¬ 
chine,  CAM-8,  has  been  constructed  [37].  The  current  8-module  CAM-8  proto¬ 
type,  with  a  site  density  of  222  16-bit  sites  per  module,  has  a  total  of  32  million 
sites.  Within  the  next  few  years  a  large  CAM-8  sponsored  by  the  US  Air  Force 
will  be  constructed  with  at  least  109  sites  and  will  have  a  computational  rate 
of  approximately  12.5  billion  site  updates  per  second.  This  site  update  rate  is 
about  two  orders  of  magnitude  faster  than  that  achievable  with  current  paral¬ 
lel  computers  such  as  the  Connection  Machine,  CM-5.  In  §7  we  present  some 
simulation  results  on  the  CAM-8  prototype  and  the  CM-5. 
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3  Why  lattice-gases? 


There  are  many  reasons  for  studying  lattice-gases,  both  practical  and  theoreti¬ 
cal.  Some  commonly  cited  reasons  are  oriented  towards  computer  science  and 
issues  related  to  massively  parallel  processing.  There  are  also  appealing  reasons 
related  to  modeling  physical  systems  with  complex  boundary  conditions.  The 
lattice-gas’  attributes  include:  1)  bit  efficiency;  2)  inherent  simplicity;  3)logic 
density;  and  4)  exact  computability. 

Firstly,  lattice-gas  automata  allow  for  high  bit  efficiency.  A  single  digital 
bit  is  used  to  represent  a  particle.  Unlike  in  floating  point  calculations  where 
there  exist  uncontrolled  round-off  errors  in  the  least  significant  bits,  in  lattice- 
gases  all  bits  have  equal  weight,  or  to  quote  Frisch,  there  is  “bit  democracy.” 
Consequently,  the  efficiency  with  which  bits  are  used  may  be  higher  for  lattice- 
gases. 

Secondly,  lattice-gas  automata  possess  an  inherent  simplicity.  Just  as  simple 
models  in  statistical  mechanics,  such  as  the  Ising  model,  shed  light  on  equilib¬ 
rium  critical  phenomena,  so  too  do  lattice-gas  models  shed  light  on  dynami¬ 
cal  phenomena[18].  Moreover,  their  inherent  simplicity  gives  them  pedagogical 
value  since  many  properties  of  macroscopic  systems  can  be  understood  through 
analytical  expressions  given  very  simple  local  rules.  For  example,  lattice-gases 
are  a  simple  way  to  understand  details  of  fluid  systems  such  as  the  dependence 
of  the  shear  viscosity  on  particle  collision  rates.  Computational  fluid  dynam¬ 
ics  codes  are  complicated  and  intricate  in  their  approximations.  Lattice-gases 
are  perhaps  the  simplest  expression  of  Navier-Stokes  flows  and  are  easily  imple¬ 
mented. 

Thirdly,  the  combination  of  bit  efficiency  with  the  simplicity  and  locality 
of  some  lattice-gas  rules  allows — in  principle — nearly  ideal  logic  density.  At 
the  highest  logic  density  that  is  physically  possible,  there  is  the  interesting 
prospect  of  lattice-gas  architectures  built  out  of  “quantum  hardware.”  There  is 
the  expectation  that  in  the  future,  computation  will  be  achieved  on  quantum 
computers  [38,  39,  40,  41].  As  the  fundamental  computational  element’s  size 


reduces  to  nano-scale  ranges  its  behavior  is  governed  by  quantum  mechanics. 
Quantum  mechanics  requires  unitary,  and  hence  invertible,  time  evolution — the 
microscopic  reversibility  of  the  lattice-gas  dynamics  is  important  here.  Even 
before  quantum  mechanics  becomes  a  constraint,  the  reversibility  of  lattice  gas 
dynamics  may  become  a  significant  benefit,  since  at  very  high  logic  densities  the 
dissipation  of  heat  caused  by  irreversible  computations  will  become  an  issue [42, 
43]. 

Fourthly,  lattice-gas  automata  are  exactly  computable.  Richard  Feynman 
[44]  considered  on  a  discrete  spacetime  lattice  “the  possibility  that  there  is  to  be 
an  exact  simulation,  that  the  computer  will  do  exactly  the  same  as  nature” ,  and 
that  using  computers  in  an  exactly  computable  way  may  lead  to  new  possibilities 
in  our  understanding  of  physics.  Although  lattice-gases  cannot  model  quantum 
systems  [44],  they  do  model  classical  systems  while  keeping  mass,  momentum, 
and  energy  exactly  conserved.  Exact  modeling  is  valuable,  for  example,  in  cases 
where  multiparticle  correlations  are  essential  to  the  system’s  behavior.  Lattice- 
gas  simulations  can  verify  theoretical  predictions  beyond  the  Boltzmann  mean- 
field  approximation  of  uncorrelated  collisions:  the  phenomenon  of  long-time  tails 
in  the  velocity  autocorrelation  function  [45,  46,  47]  has  recently  been  observed 
in  lattice  gases  [48,  49,  50]. 

4  Lattice-Gas  Automata 

We  first  define,  in  the  usual  way,  what  a  lattice-gas  cellular  automaton  is.  Then 
we  analytically  treat  the  lattice-gas  in  the  Boltzmann  limit  to  show  that  one 
may  use  strictly  deterministic  local  rules  to  obtain  the  correct  macroscopic  limit. 
We  show  in  particular  that  a  chiral  system  is  adequate  to  obtain  correct  hydro¬ 
dynamics.  We  then  summarize  the  derivation  of  the  equations  of  motion  for  a 
thermal  lattice-gas.  Finally,  we  discuss  our  conventions  for  embedding  a  hexag¬ 
onal  lattice  in  a  square  lattice.  We  show  the  streaming  relations  used  here  for 
local  and  nonlocal  collisions. 


9 


4.1  Some  Preliminaries  about  the  Local  Dyanamics 

Variables  used  are  the  following 


Mass  Unit 
Spatial  Unit 
Temporal  Unit 

Particle  Speed 

Sound  Speed 
#  Momentum  Directions 
Lattice  Vectors 
a 

Particle  Number  Variable 
Distribution  Function 
Collision  Operator 
Jacobian  Matrix 
Number  Density 
Mass  Density 
Bulk  Velocity 
Total  Internal  Energy 


m 

l 

T 

l 

C  =  — 

T 

Cs 

B 

ea 

1,2, ...  ,B 
na 

fa 

Da 

Jab 

n 

p  =  mn 

v 

£ 


Consider  a  spacetime  lattice  with  N  spatial  sites,  unit  cell  size  /,  and  time 
unit  r.  Particles,  with  mass  to,  propagate  on  the  lattice  with  speed  c  =  1/t. 
The  lattice  vectors  are  denoted  by  ea  where  a  =  1, 2, . . . ,  B.  A  particle’s  state 
is  completely  specified  at  some  time,  t,  by  specifying  its  position  on  the  lattice, 
x,  and  its  momentum,  p  =  mcea.  The  particles  obey  Pauli  exclusion  since  only 
one  particle  can  occupy  a  single  momentum  state  at  a  time.  The  total  number  of 
configurations  per  site  is  2B .  The  total  number  of  single  particle  states  available 
in  the  system  is  -/Vtotal  =  BN.  With  P  particles  in  the  system,  we  denote  the 
filling  fraction  by  d  =  P/Nt otal. 

The  number  variable,  denoted  by  na(x,t),  has  the  value  one  if  a  particle 
exists  at  site  x  at  time  t  in  momentum  state  mcea  and  zero  otherwise.  The 
evolution  of  the  lattice-gas  can  be  written  in  terms  of  na  as  a  two-part  collision 
and  streaming  process.  The  collision  part  permutes  the  particles  locally  at  each 
site 

n'a(x,t)  =  ?ra(x,f)  +  Oa(n(x,f)),  (1) 
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where  fi0  represents  the  collision  operator  and  in  general  depends  on  all  the 
particles  at  the  site.  The  streaming  part  permutes  the  particles  globally.  A 
particle  at  position  x  “hops”  to  its  neighboring  site  at  x  +  lea  and  then  time  is 
incremented  by  r 

n'a(x  +  lea,t  +  t)  =  na(x,  t)  +  fl0(n(x,f)).  (2) 

Equation  (2)  is  the  lattice-gas  cellular  automaton  equation  of  motion.  Because 
the  dynamics  only  permutes  the  occupation  of  states,  the  system  is  strictly 
reversible,  see  figure  1. 

4.2  Coarse-Grained  Dynamics 

To  simplify  the  theoretical  analysis  of  the  lattice-gas  dynamics,  it  is  convenient 
to  work  in  the  Boltzmann  limit  where  a  field  point  is  obtained  by  a  block  average 
over  the  number  variables.  That  is,  we  may  define  a  single  particle  distribution 
function,  fa  =  (n0).It  should  be  understood  that  whenever  the  single  particle 
distribution  function  is  written,  its  subscripted  index  is  taken  modulo  B 

fa-\-b  f  mod  B(a+b)'  (3) 

Using  the  Boltzmann  molecular  chaos  assumption  the  averaged  collision  opera¬ 
tor  simplifies  to  (fla)  =  fi0((?r)),  and  by  a  Taylor  expansion  (2)  we  obtain  the 
lattice  Boltzmann  equation 


(h  fa  T  C€aidifa  1 1  a  • 


(4) 


A  careful  treatment  of  this  procedure  is  given  by  Frisch  et  al.  [34].  A  general 
collision  operator  is  constructed  as  follows 


{Cil 


(5) 


where  {£;}  is  a  set  of  occupied  particle  states  and  a  =  ±1  is  a  scalar  coefficient 
and  where  each  term  in  the  sum  is  written  in  factorized  form  as 

fa+ik  B 


Qa (f  1  >  •  •  •  Ufc)  — 


fa+il 

1  fa+ii  1  fa+ 


(6) 
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We  expand  the  distribution  function  about  its  equilibrium  value,  /eq 


fa  =  r  +  5  fa 


(7) 


so  that,  to  first  order,  we  have 


fia(r)  =  £  -gjr-6fb.  (8) 

The  l.h.s.  of  (8)  must  vanish,  since  the  particle  distribution  is  non-changing 
under  equilibrium  conditions.  The  eigenvalues  of  the  Jacobian  of  the  collision 
operator, 


Jab  — 


dilg 

dfb ’ 


(9) 


can  be  calculated  and  the  number  of  these  that  vanish  must  equal  the  number 
of  invariant  quantities  in  the  lattice-gas  dynamics.  Because  of  the  finite-point 
group  symmetry  of  the  spatial  lattice,  the  Jacobian  matrix  will  be  circulant,  its 
elements  can  be  specified  by  the  difference  of  their  indices,  Jab  =  Ja-b-  This 
property  of  the  Jacobian  simplifies  the  solution  of  the  eigenvalue  equation 


£ja_fe^  =  Afce  (10) 

b 

where  k  =  1 , . . .  ,B.  Let  us  make  the  ansatz  that  the  eigenvectors  have  the 
following  form 

£ k  iak/B  (11) 

Then  inserting  (11)  into  (10)  and  taking  m  =  a  —  6,  gives 

A k  =  ^JmeMmklB.  (12) 

m 

4.3  Triangular  Lattice:  B=6 

When  implementing  a  lattice-gas  on  a  parallel  computer  it  is  most  convenient 
to  use  deterministic  updating  rules.  This  is  important  for  several  reasons.  First 
of  all,  using  deterministic  rules,  the  lattice-gas  possesses  a  strict  time-reversal 
invariance.  Therefore,  it  mimics  the  time-reversal  invariance  of  natural  physical 
laws.  As  a  practical  matter,  the  reversibility  allows  one  to  run  the  gas  dynamics 
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forward  to  some  state  and  then  back  to  its  initial  state.  This  is  a  good  way 
to  check  if  the  local  rules  are  coded  correctly.  Second  of  all,  the  generation 
of  random  numbers  typically  takes  times  and  using  random  bits  increases  the 
amount  of  state  that  the  rule  must  deal  with.  Therefore,  deterministic  local 
rules  are  used  here.  Lattice-gas  collisions  can  be  catergorized  with  even  or  odd 
chirality.  The  method  employed  here  uses  even  chirality  collisions  on  even  time 
steps  and  odd  chirality  collisions  on  odd  time  steps,  thereby  eliminating  the 
need  for  a  random  coin  toss.  The  validity  of  such  a  partitioning  of  the  collisions 
must  be  justified.  This  may  be  done  in  a  straight  forward  way  using  the  results 
of  §4.2.  The  simplest  hydrodynamic  example  is  a  definite  chirality  hexagonal 
lattice-gas,  almost  identical  to  the  usual  FHP  gas  except  no  random  coin  toss  is 
made.  For  a  hexagonal  lattice,  B  =  6,  the  eigenvectors  of  the  Jacobian  matrix, 


(11),  are  simply  composed  of  1  plus  the  three  roots  of  -1 

Co  =  (1,1, 1,1, 1,1)  (13) 

6  =  (e,  £*,  —1,  e,  e*,  1)  (14) 

6  =  (e*,e,l,e*,e,l)  (15) 

^  =  (-1,1, -1,1, -1,1)  (16) 

£4  =  (e,e*,l,e,e*,l)  (17) 

£5  =  (e*,  e,  —1,  e* ,  e,  1),  (18) 

where  e  =  e*3 .  The  collision  operator  that  produces  2-body  and  3-body  sym¬ 
metric  collision  is  the  following 

0o  =  Qa(  1, 4)  -  Qa( 0, 3)  +  Qa(  1, 3, 5)  -  Qa( 0, 2, 4),  (19) 

which  is  written  in  expanded  form  as 


Oo  =  -  (/o  (1  -  /1)  (1  -  /2)  h  (1  -  k)  (1  -  h))  + 
(1  -  /o)  h  (1  -  h)  (1  -  h)  k  (1  -  h)  - 
fo  (1  —  /l)  fl  (1  —  f-i)  fi  (1  -  fb)  + 

(i  -  f0)  h  (i  -  h)  h  (i  -  k)  h 
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Using  (9)  the  Jacobian  may  be  calculated3 

J  =  circ[(l  —  f)2  /,  —(1  —  /)2  /,  (—1  +  /)2  /2, 

(1-2/)  (1  — /)2 /,  (1  — /)2  /  (—1  +  2/), 

-(l-/)2/2] 

Using  (12),  the  eigenvalues  of  J  may  be  directly  calculated 

A0  =  0 

Ai  =  0 

A2  =  2  e  (1  +  e)  (—1  +  /)3  / 

A3  =  -6(1  -/)2/2 

A4  =  2e(l  +  e)  (l-/)3/ 

A5  =  0 

There  are  only  three  zero  eigenvalues,  so  the  deterministic  FHP-type  lattice-gas 
model  possesses  only  three  invariants:  the  total  mass  and  the  two  components 
of  momentum.  The  methodology  of  successively  switching  between  left  and 
right-handed  collision  tables  is  therefore  justified,  at  least  in  the  Boltzmann 
limit.  Switching  between  left  and  right-handed  collision  tables  is  done  on  the 
CAM-8  since  there  is  no  additional  time  or  memory  cost  (per  module)  incurred 
in  using  multiple  tables.  However,  since  storing  multiple  lookup  tables  costs 
additional  memory  on  the  CM-5,  in  practice  we  use  only  a  single  collision  table, 
and  therefore,  our  CM-5  simulations  use  a  chiral  lattice-gas.  In  fact,  the  Von 
Karman  street  simulation  in  §7  presented  in  figure  9  is  an  example  of  a  chiral 
lattice-gas. 

4.4  Single  Particle  Multispeed  Fermi-Dirac  Distribution 
Function 

It  is  essential  to  verify  that  in  the  macroscopic  limit,  the  cellular  automaton 
equation  of  motion  (2)  leads  to  Navier-Stokes  hydrodynamics.  To  verify  this, 
we  begin  with  the  most  general  form  of  the  single  particle  distribution  func¬ 
tion,  appropriate  for  even  multispeed  lattice-gases:  the  Fermi-Dirac  distribution. 
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Fundamentally,  this  arises  because  the  individual  digital  bits  used  to  represent 
particles  satisfy  a  Pauli-exclusion  principle.  Therefore,  the  distribution  must  be 
written  as  a  function  of  the  sum  of  scalar  collison  invariants,  a  +  de^Vi  + 
implying  the  following  form 


fa  = 

J  a 


1 


X  -|-  ea+Peaivi+7e<r 


(20) 


Using  the  identities  in  the  appendix,  an  expansion  to  fourth  order  of  (20)  about 
zero  velocity  is  the  following 


fa  =  da 

d(j ( 1  -  da)(3 leJF  — 
c 

1  V ^ 

--(4(1  -  da)(a2  + 

+±da(l-d<7)(l-2da)fte:ie:jV^ 

-l-d<T{l-da)kel^~ 

+  \da{  1  -  <4)(  1  -  2c4)/?i(a2  +7/2ea)eaai 

-  d*){l  -  6 da  +  6dl)pfe°ie°je°kVlV^3Vk 

+0(v 4) 

where  da  =  |v-o-  The  coarse-grain  averaged  dynamics  depend  on  the  follow¬ 

ing  dynamical  variables. 

Particle  number  density: 

mJ2fa=P,  (21) 

a, a 

Momentum  density: 

mC^2eaifa  =  PVi’  (22) 

a, a 

Moment  density  flux  tensor: 

me2  Y  Cieljfl  =  n ij-  (23) 

a, a 

Total  energy  density,  half  the  trace  of  the  momentum  flux  tensor: 

ne  =  a.  (24) 
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Pressure  tensor,  P: 


hi  =  mJ2  fa  ( ceaai  -  Vi)  (ce*  •  -  vj) .  (25) 

a, a 

Heat  flux,  q: 

qi  =  m^2fZ  ( ceaai  -  Vi)2  (ceZj  -  vj)  .  (26) 

a,cr 

In  equilibrium,  the  cellular  automaton  dynamical  equation  (4)  reduces  to 

9tfZ  +  <idifZ  =  0.  (27) 

(27)  implies  3  conservation  equations.  To  obtain  these  equations,  the  identities 
for  isotropic  lattice  vectors  given  in  the  appendix  are  necessary.  Using  (21)  and 
(22)  in  (27)  gives  continuity  (mass  conservation): 

dtp+  di(pvi)  =  0.  (28) 

Using  (22)  and  (23)  in  (27)  gives  the  Navier-Stokes  equation  (momentum  con¬ 
servation)  : 

dt(pvi)  +  dj(pgviVj)  =  -dtp  +  r\d2Vi.  (29) 

Using  (25)  and  (26)  in  (27)  gives  the  heat  equation  (energy  conservation): 

dt  {ns)  +  di(nevi)  +  ^ dzqi  +  ()l{r,l',l  \  =  0,  (30) 

where 

P=  (7-  l)(ne-  ^pgv2)  (31) 

and  7  =  <^r-  or  7  —  1  =  ^ .  Although  the  lattice-gas  may  in  principle  be 
comprised  of  an  indefinite  number  of  speeds,  from  (31)  we  see  that  the  pressure 
depends  upon  the  square  of  the  bulk  velocity,  i.  e.  it  is  the  difference  of  the  total 
internal  energy  of  the  lattice-gas  minus  the  bulk  kinetic  energy.  In  a  single  speed 
lattice-gas,  this  kind  of  velocity  dependence  is  anomalous  and  is  a  well  known 
deficiency  of  the  lattice-gas.  However,  for  a  multispeed  lattice-gas  the  existence 
of  this  term  takes  on  a  physical  interpretation.  For  a  classical  ideal  gas,  the 
pressure  is  proportional  to  both  the  sound  speed  squared  and  the  temperature 

p  =  pc 2  =  nksT.  (32) 
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Since  p  =  mn ,  equating  (31)  with  the  ideal  gas  law  (32)  gives  the  total  internal 
energy  in  terms  of  the  bulk  kinetic  energy  \mv2  and  the  local  particle  thermal 
energy  kBT 


£  = 


9  2 
—mv 
2 


kBT 

7-1’ 


(33) 


The  Navier-Stokes  equation  (29),  the  pressure  (31),  and  the  total  energy  (33)  all 
explicitly  have  a  factor  g  in  them.  In  the  Boltzmann  limit,  the  factor  g  depends 
on  the  particle  speeds,  cCT,  and  on  the  density  distribution  per  speed,  da,  by  the 
following  complicated  expression 


E.  dp.  Ba{c-ffda{  1  -  da)(  1  -  2  da) 

'Y^Ba{^yda{i-da)]2 


(34) 


If  g  =  1,  then  a  multispeed  lattice-gas  automata  would  exactly  solve  the  ideal 
fluid  equations  where  the  physical  interpretation  of  the  total  internal  energy 
would  then  be  that  it  partitions  into  a  bulk  motion  term,  or  kinetic  energy, 
and  a  fluctuating  motion  term,  or  random  heat  energy  associated  with  a  cer¬ 
tain  gas  temperature.  Therefore,  the  multispeed  lattice-gas  calculation,  in  the 
Boltzmann  limit,  would  exactly  agrees  with  classical  kinetic  gas  theory,  see  the 
expression  for  the  partial  pressure  of  an  electron  gas  given  by  Li  and  Wu  [51]. 
The  factor  g  approachs  one  only  as  the  number  of  speed  in  the  lattice-gas  model 
becomes  large.  However,  for  a  small  number  of  speeds,  a  rescaling  of  the  vari¬ 
ables  recovers  the  exact  dynamics.  A  similar  observation  has  been  made  by 
Teixeira  [52]  in  his  investigation  of  multispeed  lattice-gas  automata  models. 


4.5  Hexagonal  Lattice 


In  a  hexagonal  lattice  there  are  six  lattice  vectors  which  we  enumerate  by  the 


following  convention 


ea 


(si 


ira 


7ra> 


sm  — ,  cos  — 
3  ’  3y 


(35) 


where  a  =  1,2, ...  ,6.  The  spatial  coordinates  of  the  lattice  sites  may  be  ex¬ 


pressed  as  follows 


X.y 


1 

2 


( j  mod  2) 


(36) 
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where  i  and  j  are  rectilinear  indices  which  specify  the  data  memory  array  loca¬ 
tion  used  to  store  the  lattice-gas  site  data.  For  a  multispeed  lattice-gas  it  is  nec¬ 
essary  to  shift  data  more  than  one  lattice  length.  Let  s  =  ( j  mod  2 )(r  mod  2). 
Given  a  particle  at  site  (i,  j),  it  may  be  shifted  to  a  site  r  lattice  units  away  to 
a  remote  site  ( i',j ')  by  the  following  mapping 


(i'J')i  = 

(*+  2  ^  r) 

(37) 

(i'd'h  = 

(38) 

(i'j'h  = 

(*  -  r,j) 

(39) 

4  = 

(  .  T  +  1  .  \ 

l*+  2  s,j+r\ 

(40) 

= 

(*-2  ~s,j  +  r) 

(41) 

(*',/)  e  = 

( i  +  r,j ) 

(42) 

where  ( i',j')a  denotes  the  shifted  site,  i.e.  ( i,j )  —>  with  a  shift  along 

vector  r  =  rea  and  where  division  by  2  is  considered  integer  division. 

These  streaming  relations  are  useful  for  implementing  a  lattice  gas  in  a 
structured  language  such  as  the  C-language.  Our  implementation  on  the  CM-5 
in  the  C-language  and  DPEAC  use  these  relations  for  all  address  computations. 
In  these  streaming  relations,  the  modulus  operator  is  base  2  because  a  two- 
dimensional  hexagonal  lattice  embedded  into  a  square  three-dimensional  mesh 
is  pleated. 

The  simplest  way  to  see  this  embedding  is  to  define  z  =  ( j  mod  2).  Therefore 
the  third  dimension  along  the  z-axis  is  narrow,  only  one  lattice  distance  wide. 
Half  of  the  lattice  sites  are  at  z  =  0  and  the  other  half  are  at  z  =  1.  This  divides 
the  hexagonal  lattice  into  two  sublattices  that  we  refer  to  as  pleat  0  and  pleat  1. 
Table  1  lists  the  components  of  the  data  translation  vectors,  or  kicks,  for  each 
stream  direction,  a  =  1,  2, . . . ,  6,  for  both  pleats.  This  kick  table  was  used  for 
the  CAMForth  implementation  on  the  CAM-8  and  the  C*  implementation  on 
the  CM-5.  This  is  equivalent  our  general  streaming  relations  for  the  case  when 
r  =  1.  The  usefulness  of  this  kind  of  embedding  is  that  if  the  data  for  each 
one  of  the  sublattices  is  rendered  for  display,  it  can  be  drawn  in  simple  raster 
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form  and  fluid  structures  will  appear  correctly,  i.e.  a  sound  pulse  will  appear 
circular. 

5  The  Cellular  Automata  Machine  CAM-8 

The  cellular  automata  machine  CAM-8  architecture  devised  by  Norman  Mar- 
golus  of  the  MIT  Laboratory  for  Computer  Science  [37,  22]  is  the  latest  in 
a  line  of  cellular  automata  machines  developed  by  the  Information  Mechanics 
Group  at  MIT  [28,  4,  33].  It  is  optimized  for  performing  lattice-gas  simulations. 
The  CAM-8  architecture  itself  is  a  simple  abstraction  of  lattice  gas  dynamics. 
Lattice  gas  data  streaming  and  collisions  are  directly  implemented  in  the  ar¬ 
chitecture.  The  communication  network  is  a  cartesian  three-dimensional  mesh. 
Crystallographic  lattice  geometries  can  be  directly  embedded  into  the  CAM-8. 
Each  site  of  the  lattice  has  a  certain  number  of  bits  (a  multiple  of  16)  which  we 
refer  to  as  a  “cell” .  Each  bit  of  the  cell,  or  equivalently  each  bit  plane  of  the 
lattice,  can  be  translated  through  the  lattice  in  any  arbitrary  direction.  The 
translation  vectors  for  the  bit  planes  are  termed  “kicks”.  The  specification  of 
the  x,y,  and  z  components  of  the  kicks  for  each  bit  plane  (or  hyperplane)  exactly 
defines  the  lattice.  An  interesting  property  of  the  architecture  is  that  the  kicks 
can  be  changed  during  the  simulation.  Therefore,  the  data  movement  in  the 
CAM-8  can  be  quite  general.  Once  the  kicks  are  specified,  the  coding  of  the 
lattice-gas  streaming  is  completed.  In  effect,  the  kicks  determine  all  the  global 
permutations  of  the  data. 

Local  permutations  of  data  occur  within  the  cells.  These  permutations  are 
the  computational  metaphor  for  physical  collisions  between  particles4.  All  local 
permutations  are  implemented  in  look-up  tables.  That  is,  all  possible  physical 
events  with  a  certain  input  configuration  and  a  certain  output  configuration 
are  precomputed  and  stored  in  SRAM,  for  fast  table  look-up.  The  width  of  the 
CAM-8  look-up  tables  are  limited  to  16-bits,  or  64K  entries.  This  is  a  reasonable 
width  satisfying  the  opposing  considerations  of  model  complexity  versus  memory 
size  limitations  for  the  SRAM.  Site  permutations  of  data  wider  than  16-bits 
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must  be  implemented  in  several  successive  table  look-up  passes.  Since  the  look¬ 
up  tables  are  double  buffered,  a  scan  of  the  space  can  be  performed  while  a  new 
look-up  table  is  loaded  for  the  next  scan. 

Figure  4  is  a  schematic  diagram  of  a  CAM-8  system.  On  the  left  is  a  single 
hardware  module — the  elementary  “chunk”  of  the  architecture.  On  the  right 
is  an  indefinitely  extendable  array  of  modules  (drawn  for  convenience  as  two- 
dimensional,  the  array  in  normally  three-dimensional).  A  uniform  spatial  calcu¬ 
lation  is  divided  up  evenly  among  these  modules,  with  each  module  simulating 
a  volume  of  up  to  millions  of  fine-grained  spatial  sites  in  a  sequential  fashion. 
In  the  diagram,  the  solid  lines  between  modules  indicate  a  local  mesh  intercon¬ 
nection.  These  wires  are  used  for  spatial  data  movements.  There  is  also  a  tree 
network  (not  shown)  connecting  all  modules  to  the  front-end  host,  a  SPARC 
workstation  with  a  custom  SBus  interface  card,  that  controls  the  CAM-8.  It 
downloads  a  bit-mapped  pattern  as  the  initial  condition  for  the  simulations.  It 
also  sends  a  “step-list”  to  the  CAM-8  to  specify  the  sequence  of  kicks  and  scans 
that  evolve  the  lattice-gas  in  time.  One  can  view  the  lattice-gas  simulation  in 
real-time  since  a  custom  video  module  captures  site  data  for  display  on  a  VGA 
monitor,  a  useful  feature  for  lattice-gas  algorithm  development,  test  and  eval¬ 
uation.  The  CAM-8  has  built-in  25-bit  event  counters  so  that  measurements 
can  be  done  in  real-time  without  slowing  the  lattice-gas  evolution.  We  have 
used  this  feature  to  do  real-time  coarse-grain  block  averaging  of  the  lattice-gas 
number  variables  and  to  compute  the  components  of  the  momentum  vectors  for 
each  block.  The  amount  of  coarse-grained  data  is  sufficiently  small  that  it  can 
be  transferred  back  to  the  front-end  host  for  graphical  display  as  an  evolving 
flow  field  within  an  X- window.  See  figures  7,  8  and  9  for  example  flow  fields. 

6  The  Connection  Machine  CM-5 

The  Thinking  Machines  Corporations ’s  CM-5  is  a  massively  parallel  computer 
that  contains  up  to  16384  processing  nodes[53]5.  Figure  5  shows  a  processing 
node  consisting  of  a  SPARC  CPU,  32  Mbytes  of  memory  and  4  Vector  processing 
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units.  These  processing  nodes  are  all  connected  via  a  “fat-tree”  communications 
net  that  allows  fast  inter-node  communication.  These  processing  nodes  are 
controlled  by  a  front-end  host  computer  which  is  a  modified  SUN  workstation. 
The  SPARC  processor  on  each  node  issues  instructions  to  the  vector  units  and 
performs  most  bookkeeping  tasks  while  the  vector  units  perform  arithmetic  and 
logical  operations  on  the  data.  Each  vector  unit  has  a  peak  rate  of  32  million 
64-bit  ops  (floating  point  or  integer)  for  a  combined  total  of  128  Mops/node. 
Each  node’s  memory  is  divided  into  8  Mbyte  banks,  one  for  each  vector  unit. 
Each  vector  unit  has  it’s  own  independent  128  Mbyte/sec  path  to  memory  for 
a  combined  memory  bandwidth  of  512  Mbyte/sec  for  each  node.  The  CM-5 
at  the  Army  High  Performance  Computing  Research  Center  in  Minneapolis, 
Minnesota  contains  544  nodes  for  a  total  of  16  Gb  of  memory  and  64  Gops  of 
peak  processing  speed. 

We  have  also  implemented  a  lattice  gas  simulator  on  the  CM-5  in  a  multiple 
instruction  multiple  data  (MIMD)  style.  The  CMMD  message  passing  library 
is  used  for  inter-node  communication  and  host-node  interaction.  In  order  to  get 
the  highest  possible  performance  we  explicitly  manipulate  the  vector  units  on 
each  node  using  their  assembler  language  known  as  DPEAC.  To  ease  the  burden 
of  hand  coding  the  vector  units  a  macro  package  known  as  GCC/DPEAC  is  used. 
This  package  uses  features  available  in  the  GNU  C  compiler  to  issue  assembler 
language  instructions  from  ANSI  C  and  simplifies  matters  considerably. 

We  partition  the  problem  space  into  equally  sized  rectangular  units.  Each 
processing  node  is  responsible  for  updating  one  of  these  rectangular  units.  This 
partitioning  allows  one  to  send  a  small  number  of  long  messages  to  connect 
the  space  together.  Inter-node  communication  is  only  necessary  along  one  of 
the  axes  of  the  problem  space.  Since  the  inter-node  communications  network  is 
optimized  for  long  message  lengths  we  expect  that  this  partitioning  will  make 
the  effective  use  of  available  communications  bandwidth.  Within  a  processing 
node,  each  of  the  4  vector  units  is  responsible  for  updating  it’s  quarter  of  the 
space.  Communication  between  each  vector  unit’s  8  Mbyte  bank  of  memory  is 
mediated  by  the  SPARC  processor. 
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There  are  two  distinct  phases  of  a  lattice  gas  update  cycle.  The  first  phase 
is  the  collision  phase  where  particles  can  interact.  The  second  phase  involves 
streaming  of  the  bits  to  their  new  locations,  consistent  with  their  velocity  and 
the  lattice  on  which  the  simulation  is  being  performed.  In  most  lattice  gas 
models  all  collisions  can  happen  concurrently  and  all  sites  can  stream  their  data 
concurrently  as  well.  The  collision  phase  can  be  handled  via  look  up  tables 
(LUT’s)  for  16  bit  sites.  The  LUT  is  attractive  in  that  it  can  be  an  extremely 
simple  and  fast  update  mechanism. 

We  have  distributed  the  LUTs  throughout  the  machine,  indeed  each  vector 
unit  has  it’s  own  copy  of  the  LUT.  Figure  6  shows  the  memory  layout  on  each 
node.  During  the  collision  phase  each  vector  unit  fetches  all  the  sites  in  it’s 
partition  of  the  problem  space  and  runs  them  through  it’s  copy  of  the  LUT. 
Since  each  vector  unit  has  it’s  own  independent  128  Mbyte/sec  data  path  to 
a  bank  of  memory,  this  operation  can  be  performed  extremely  rapidly.  With 
this  high  degree  of  parallelism  the  LUT  operation  consumes  a  small  fraction 
of  the  time  necesary  to  update  the  space.  As  the  number  of  bits  of  site  data 
grows  beyond  16  [64K  entries]  the  LUT’s  begin  to  consume  too  much  memory. 
For  models  that  involve  larger  quantities  of  site  data  ( i.e .  #  bits  >  20)  other 
methods  involving  LUT  compression/decompression  need  to  be  used  for  the 
collision  phase.  [54] 

The  streaming  phase  is  more  complex.  The  approach  taken  here  is  to  hold 
the  address  of  each  bit’s  destination  in  a  pre-computed  table.  These  tables 
may  be  computed  in  ordinary  C,  which  is  advantageous  for  changing  from  one 
model  to  another.  Additionally,  potentially  complex  addressing  calculations  are 
performed  only  once,  during  initialization.  Before  a  site  can  be  updated,  a  com¬ 
munication  phase  must  take  place  so  that  each  site  can  access  all  it’s  neighbors. 
Communication  must  take  place  across  node  and  vector  unit  boundaries.  The 
communication  is  done  so  that  every  site  has  access  to  all  it’s  neighbor  values 
on  one  vector  unit’s  8  Mbyte  bank  of  memory.  To  update  a  particular  site  each 
vector  unit  first  loads  all  the  neighbors  of  the  site  to  be  updated  from  the  ap¬ 
propriate  areas.  Then  the  relevant  bits  from  each  neighbor  site  are  extracted 
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sequentially  to  build  up  the  new  site  value.  The  new  site  value  is  then  written 
to  memory. 

After  implementing  a  lattice-gas  simulator  with  the  above  considerations  in 
mind  we  find  that  we  can  achieve  update  rates  on  the  order  of  550  Msites/sec 
on  a  256  node  partition  of  the  CM-5  for  the  FHP  gas  model.  This  timing  was 
done  using  a  32K  x  2K  lattice.  We  find  that  the  longer  the  system  is  across 
each  node  the  greater  the  performance  realized.  This  is  due  to  the  fact  that 
long  system  sizes  across  each  node  increase  the  fraction  of  sites  in  the  interior  of 
each  vector  unit  that  do  not  need  to  communicate  with  sites  on  adjacent  vector 
units  or  processing  nodes. 

7  Gallery  of  Computational  Results 

7.1  Rayleigh-Benard  Convection  on  the  CAM-8 

A  well  known  fluid  instability  of  a  thermohydrodynamic  system  is  Rayleigh- 
Benard  convection  [55,  56].  Rayleigh-Benard  convection  is  popular  because  one 
can  observe  the  onset  of  order  and  then  the  transition  to  chaos  in  the  flow 
patterns  [57,  58]. 

Our  implementation  of  the  two-speed  hexgonal  lattice-gas  with  a  rest  par¬ 
ticle,  includes  gravitational  forcing,  free-slip  and  no-slip  boundaries  which  may 
be  oriented  horizontally,  vertically,  or  inclined  ±60°,  and  heating  and  cooling 
sites  in  order  to  model  temperature  controlled  boundary  surfaces.  This  has  been 
encoded  within  the  site  data  space  of  16-bits  per  site  for  simple  implentation  on 
the  CAM-8.  The  ability  of  encoding  such  complex  dynamics  within  16-bits  is 
one  of  the  remarkable  aspects  of  the  lattice-gas  formalism  in  terms  of  efficient 
memory  use  affording  us  the  ability  to  do  flash  updating  from  prestored  colli¬ 
sion  tables.  98%  of  the  216  entries  in  the  collision  tables  are  used  (i.e.,  not  the 
identity)  in  this  model.  Similar  lattice  gas  models  have  been  implemented  by 
Burges  and  Zaleski  [59]  and  by  Chen  et  al.  [9]  and  Ernst  and  Das  [60]. 

To  optimize  the  collision  frequency  between  the  fast  and  slow  particles  we 
have  chosen  their  momenta  to  be  of  unit  value.  That  is,  the  slow  particles  have 
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unit  mass,  mi  =  1  and  the  fast  particles  have  half  the  mass,  m2  =  1/2.  In  this 
way,  pi  =  P2  =  1  and  their  energies  are  E 1  =  1  and  E2  =  2.  With  this  con¬ 
vention  we  have  the  usual  FHP-type  collisions  [32]  between  the  different  speed 
particles  while  conserving  mass,  momentum,  and  energy.  These  include  head-on 
2-body  collisions,  three-body  collisions,  collisions  with  spectators,  etc.  Grosfils, 
Boon  and  Lallemand  have  introduced  a  three-speed  thermohydrodynamic  gas 
with  speeds  1,^,2  and  a  rest  particle  [61].  With  this  19-bit  model,  efficient 
collisional  mixing  can  occur  with  all  particle  having  the  same  unit  mass.  Since 
the  particles  may  now  carry  different  units  of  energy,  in  addition  to  the  equation 
of  continuity  and  Euler’s  equation,  in  this  system  we  have  an  energy  transport 
equation. 

7.2  Kelvin- Helmholtz  Instability  on  the  CAM-8 

Another  well  known  fluid  instability  is  the  Kelvin-Helmholtz  shear  instability. 
Figure  8  shows  the  a  simulation  of  a  shear  instability  on  a  hexagonal  lattice 
4096  x  2048  in  size  with  toroidal  boundary  conditions.  A  momentum  map  is 
overlayed  on  a  vorticity  map.  Clockwise  vorticity  is  shaded  red  and  counter¬ 
clockwise  vorticity  is  shaded  blue.  The  initial  conditions  for  the  simulation  are 
very  uniform,  see  Figure  8a.  A  gas  density  is  chosen,  in  this  case  approximately 
1/7  filling,  and  two  horizontal  regions  are  set  with  uniform,  but  opposing  flow 
directions.  That  is,  the  majority  of  the  fluid,  the  background  region,  is  set  with 
a  uniform  flow  velocity  of  approximately  0.4  cs  (Mach  0.4)  flowing  to  the  right. 
A  narrow  stripe  256  sites  wide  is  set  in  the  center  of  the  space  flowing  to  the  left 
at  -0.4  cs.  No  sinusoidal  perturbation  is  given  to  the  counter-flow  narrow  stripe 
region  as  in  previous  lattice-gas  simulations  [62].  No  external  forcing  is  applied 
during  the  simulation  run.  The  only  perturbation  is  caused  by  minor  fluctations 
produced  by  the  random  number  generator  when  producing  a  uniform  fluid 
density.  After  approximately  10,000  time  steps,  the  narrow  horizontal  center 
stripe  forms  a  sinusoidal  pattern.  The  sinsoid’s  amplitude  grows,  form  a  wave 
that  eventually  breaks  into  several  counter-rotating  vortices  and  the  two  flow 
regions  begin  to  substantially  mix.  Figure  8  shows  the  initial  state  of  the  fluid 
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and  then  the  resulting  states  at  10,000  and  30,000  time  steps.  By  t  =  30, 000  the 
formation  of  a  wave  is  apparent.  Eventually  after  400,000  time  steps,  the  fluid 
attains  a  uniform  flow  to  the  right  after  the  system  has  equilibrated,  exactly 
conserving  the  momentum  in  the  initial  configuration. 

7.3  Von  Karman  Streets  on  the  CM-5 

Figure  9  shows  a  simulation  of  vortex  shedding  from  a  flat  plate  after  32,000 
time  steps  on  a  hexagonal  lattice  4096  x  2048  in  size.  A  momentum  map  is 
overlayed  on  a  vorticity  map.  Clockwise  vorticity  is  shaded  red  and  counter¬ 
clockwise  vorticity  is  shaded  blue.  A  flat  plate  obstacle  is  placed  in  a  channel  of 
fluid  with  a  flow  directed  towards  the  right  of  the  figure.  The  fluid  flow  is  forced 
by  completely  reconstructing  the  fluid’s  velocity  distribution  at  each  time  step 
in  a  “forcing  strip”  at  the  left  of  the  channel.  This  forcing  method  prevents 
sound  waves  and  other  disturbances  from  propagating  around  the  torus  in  the 
flow  direction  and  overwhelming  the  flow  behavior.  The  boundary  conditions 
are  effectively  cylindrical. 

The  flow  is  started  from  a  random  distribution  of  particles  at  the  appropriate 
density  with  a  net  velocity  close  to  that  of  the  steady  state  flow.  Since  this  is  not 
a  true  equilibrium  starting  condition,  some  transient  behavior  appears  in  the 
form  a  sound  pulse  that  propagates  down  the  channel.  This  pulse  is  absorbed 
by  the  forcing  strip.  After  2000  time  steps  the  system  is  equilibrated  with  no 
transient  phenomena  visible.  This  equilibration  time  is  very  short  compared 
with  the  time  necessary  for  vortex  development.  The  cylindrical  boundary  con¬ 
dition  appears  to  work  extremely  well,  allowing  one  to  run  the  simulation  to 
long  times. 

8  Discussion 

We  have  presented  a  theoretical  description  of  lattice-gas  automata  in  a  dis¬ 
crete  D-dimensional  space.  Attention  was  focused  on  two-dimensional  fluids 
for  numerical  simulation.  We  have  implemented  some  lattice-gas  fluids  on  a 
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new  prototype  cellular  automata  machine,  the  CAM-8.  Identical  models  on  the 
CM-5  were  also  implemented.  To  illustrate  lattice-gas  dynamics  in  the  macro¬ 
scopic  limit,  several  fluid  instabilities  were  tested:  (1)  the  Rayleigh-Benard  con¬ 
vective  instability  just  after  the  critical  Rayleigh  number  has  been  reached  in 
the  system  by  suitable  gravitational  forcing  and  temperature  gradient;  (2)  the 
Kelvin-Helmhotz  shear  instability;  and  (3)  the  Von  Karman  vortex  shedding 
instability. 

The  strength  of  the  CAM-8  in  these  simulations  is  that  it  is  optimized  for 
fine-grained  spatial  calculations.  It  can  handle  many  lookup-tables  because  of  its 
double  buffering.  It  can  perform  data  streaming  by  spatial  data  shifts  without 
slowing  down  the  simulation.  For  a  multispeed  system  or  a  lattice-gas  with  long- 
range  interactions,  large  shifts  are  necessary.  The  interaction  neighborhood  on 
the  CAM-8  need  not  be  local:  data  can  be  shifted  to  the  nearest  neighbor  or  a 
distance  thousands  of  sites  away  with  the  same  computational  effort. 

The  strengths  of  the  CM-5  in  these  simulations  are  its  software,  size,  and 
flexibility.  CM-5  applications  can  be  coded  largely  in  standard  programming 
languages  such  as  C  and  FORTRAN-90.  CM-5  machines  offer  the  possibility 
of  simulating  enormous  problem  spaces  due  to  their  much  larger  memories.  In 
our  implementation,  streaming  is  done  in  the  CM-5  by  address  maps  preloaded 
into  memory.  Although  this  uses  a  lot  of  memory,  the  flexiblity  of  precomputed 
addressing  tables  simplifies  the  implementation  of  complex  lattice  geometries. 

The  main  theoretical  and  computational  points  of  this  paper  are: 

1.  Deterministic  microscopic  lattice-gas  dynamics  produce  the  correct  Navier- 
Stokes  hydrodynamics  in  the  macroscopic  limit  where  no  randomness  is  used  in 
the  local  rules.  Treating  individual  digital  bits  as  multispeed  fermionic  particles 
allows  one  to  simulate  fluid  systems  where  mass,  momentum,  and  energy  are 
exactly  conserved  and  where  the  dynamics  has  a  time-reversal  invariance. 

2.  A  workstation-scale  CAM-8  prototype  is  an  efficient,  cost-effective  plat¬ 
form  for  lattice  gas  problems  that  rivals  the  capabilities  of  extant  parallel  su¬ 
percomputers. 

3.  The  lattice  gas  simulation  method  may  be  directly  ported  to  a  variety 
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of  parallel  computer  architectures:  the  facilities  provided  by  most  parallel  su¬ 
percomputers  are  suitable  for  efficiently  running  lattice  gas  applications.  Com¬ 
munication  costs  for  running  lattice-gas  automata  simulations  decrease  as  the 
problem  being  run  increases  in  size  and  can  essentially  be  neglected  for  large 
systems. 

In  closing,  it  is  interesting  that  the  massively  parallel  Connection  Machine 
2,  to  date  has  achieved  some  of  the  best  update  rates  for  lattice-gases,  for  exam¬ 
ple,  700-750  million  sites  per  second  update  rate  for  the  FHP  lattice-gas  [63,  23]. 
Building  a  multiple  instruction  multiple  data  Connection  Machine  5,  although 
more  focused  on  general  messaging,  has  improved  upon  its  predecessor’s  per¬ 
formance  (for  the  FHP  lattice-gas,  currently  the  CM-5  could  exceed  2  billion 
updates  per  second  on  a  1024-node  partition).  Although  machines  like  the  CM- 
5  can  solve  a  wider  class  of  computational  problems,  we  believe  that  effort  spent 
on  exploring  locally  interacting  automata  models  on  fine-grained  architectures 
will  lead  to  new  practical  methods  for  accurately  modeling  physics. 
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Identities  for  Isotropic  Lattice  Tensors:  life"  1 

Our  momentum  states  are  e^,  where  a  =  1, 2, ... ,  Ba,  and  where  a  denotes 
the  particle  speeds.  The  momentum  state-space  per  particle  speed  has  cardinal¬ 
ity  Ba.  Lattice  summations  over  odd  powers  of  e  must  vanish  by  symmetry.  The 
following  identities,  listed  up  to  the  fourth  moment,  hold  for  arbitrary  values  of 
Ba  and  spatial  dimension  D  [64] 
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C*  Implementation 

A  parallel  version  of  the  C-language  developed  by  Thinking  Machines  Cor¬ 
poration  is  the  C*  language[65] .  This  is  a  well  developed  language  in  spirit  very 
close  to  its  predecessor  -  it  is  as  concise  as  the  C-language  itself.  It  offers  many 
useful  constructs  making  the  coding  of  algorithms  for  parallel  data  very  effi¬ 
cient.  As  typical  of  most  parallel  languages,  an  array  operation  is  handled  in  a 
single  instruction  —  for  the  most  part  programming  loops  do  not  appear  in  the 
code.  Most  parallel  computation  is  achieved  by  data  movement.  The  geometry 
of  the  problem  is  specified  at  the  onset  by  defining  your  the  data  structure’s 
shape.  This  is  usually  a  D-dimensional  array  with  a  certain  size  in  each  dimen¬ 
sion.  The  shape  definition  defines  all  the  needed  communication  topology  for 
the  compiler.  It  is  possible  to  declare  boolean  shapes  in  C*.  We  have  used  this 
feature  to  encode  each  bit  plane  of  the  lattice-gas.  This  is  a  convenient  feature 
of  the  language  making  efficient  use  of  memory.  Normally,  in  a  lattice-gas  code, 
one  must  extract  and  insert  individual  bits  at  the  lattice  sites.  The  option  of 
working  directly  with  boolean  arrays  has  therefore  simplified  the  coding  effort 
substantially.  If  individual  elements  of  a  parallel  array  must  be  accessed,  C* 


28 


uses  the  syntax  of  parallel  left  indexing.  Right  indexing  of  arrays  is  reserved 
for  its  usual  C-language  meaning.  We  use  right  indexed  arrays  to  represent  the 
individual  bit  planes  of  the  lattice  gas. 

We  have  implemented  a  two-dimensional  hexagonal  lattice  embedded  into  a 
three-dimensional  mesh.  This  implementation  is  equivalent  to  our  implemention 
in  CAMForth  on  the  CAM-8  and  GCC/DPEAC  on  the  CM-5.  Streaming  of 
pleat  0  and  pleat  1  are  coded  separately.  We  give  a  C*  code  fragment  for  this 
embedding.  Note  that  the  comments  to  the  right  of  the  lines  correspond  exactly 
to  the  kick  components  listed  in  Table  1.  With  a  few  C*  lines  of  code  one  can 
completely  implement  hexagonal  lattice-gas  streaming.  We  have  used  the  C* 
command  to-torus-dim(destination  pointer,  source,  axis,  distance)  to  shift  the 
bit  planes  with  toroidal  boundary  conditions.  This  is  an  efficient  communication 
routine  for  sending  data  in  a  regular  fashion  using  grid  communications.  The 
partitioning  of  the  space  between  processors  is  handled  completely  by  the  C* 
compiler.  We  will  see  how  efficiently  the  compiler  does  this  partitioning  in  the 
discussion  to  follow. 

We  have  tested  our  C*  implementation  for  different  situations.  Given  a 
certain  lattice  size,  for  example  1024  x  2048,  with  have  found  the  performance 
of  the  CM-5  to  vary  linearly  with  the  number  of  processing  nodes.  This  linear 
variation  is  expected  so  long  as  the  lattice  size  is  sufficiently  large.  To  determine 
a  reasonable  lattice  size,  we  have  performed  repeated  simulations  with  different 
lattice  sizes  but  with  a  fixed  number  of  processors.  The  results  obtained  for  a 
fixed  256-node  partition  of  the  CM-5  is  given  in  figure  10,  in  which  we  plotted 
simulation  site  update  rates  for  lattice  sizes  64  x  128, 128  x  256, . . . ,  8192  x 
16384.  For  small  lattice  sizes,  the  performance  is  very  poor,  on  the  order  of  a 
million  site  updates  per  seconds.  This  is  because  the  streaming  is  limited  by 
processor  to  processor  communication  bandwidth.  As  the  lattice  size  increases, 
the  number  of  sites  interior  to  the  node  grows  and  the  number  of  sites  on  the 
partition  boundary  deceases.  Consequently,  the  site  update  rate  continuously 
improves  with  larger  lattices.  The  update  rate  asymtotically  approaches  about 
25  million  site  updates  per  second.  This  is  equivalent  to  approximately  100,000 
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sites  updates  per  processing  node.  This  is  roughly  the  maximum  update  rate 
achievable  on  a  SPARCstation  1.  A  full  512-node  partition  has  a  peak  rate  of 
about  50  million  site  updates  per  second,  which  is  about  one  quarter  the  speed 
of  the  CAM-8. 
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Table  1:  Streaming  for  2D  Hex  Lattice  Embedded  a  into  3D  Mesh 


Direction  x  y  z 

Pleat  0 
1 
2 

3 

4 

5 

6  _ 

Pleat  1 
1  -11-1 

2  -10-1 

3  0-10 

4  0  0  -1 

5  0  1-1 

6  0  10 


0  0  1 
0-11 
0-10 
1-11 
10  1 
0  10 


!We  have  written  lattice-Boltzmann  code  in  the  parallel  C-Star  language  on  the  Connection 
Machine- 5. 

2 Lattice  Boltzmann  simulation  for  three  dimensional  flows  with  Reynolds  numbers  of  about 
50,000  were  presented  in  June  1993  at  the  International  Conference  on  Pattern  Formation  and 
Lattice-Gas  Automata  sponsored  by  the  Fields  Institute,  Waterloo,  Canada. 

3  Actually,  the  most  straight  foward  way  to  calculate  the  matrix  elements  of  J  is  to  write  a 
Mathematica  code. 

4Locally,  the  CAM-8  is  not  limited  to  performing  only  permutations,  it  can  do  general 
mappings.  However,  since  we  are  interested  in  only  particle  conserving  reversible  dynamics, 
permutations  are  sufficient. 

5 Currently  the  largest  CM-5  resides  at  Los  Alamos  National  Laboratory  with  1024  Nodes. 
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Figure  1:  Illustration  on  a  hexagonal  lattice  of  the  two-step  collision  and  streaming  process 
required  to  complete  a  single  time  step:  (a)  initial  configuration,  (b)  collision  by  permutation, 
and  (c)  streaming  of  particles. 
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Figure  2:  (a)  Lattice  vector  label  convention;  (b)  Hexagonal  lattice  convention  with  lattice 
directions  a  =  3  up  and  a  =  6  down.  Coordinates  above  the  lattice  nodes  are  (i,j)  memory 
array  indices. 
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Figure  3:  MIT  Laboratory  for  Computer  Science  cellular  automata  machine  CAM-8.  This  8 
module  prototype  can  evolve  a  D-dimensional  cellular  space  with  32  million  sites  where  each 
site  has  16  bits  of  data  with  a  site  update  rate  of  200  million  per  second. 


Figure  4:  (a)  A  single  processing  node,  with  DRAM  site  data  flowing  through  an  SRAM 
lookup  table  and  back  into  DRAM,  (b)  Spatial  array  of  CAM-8  nodes,  with  nearest-neighbor 
(mesh)  interconnect  (1  wire/bit-slice  in  each  direction). 
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Figure  5:  CM-5  Node:  SPARCCPU,  32  Mbytes  of  memory  and  4  Vector  processing  units. 
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Figure  6:  Node  Memory  Layout. 
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Figure  7:  Thermo  13-bit  CAM-8  Experiment:  Rayleigh-Benard  convection  cells  at  the 
critical  Rayleigh  number.  Lattice  Size:  2048  x  1024.  Time  Average:  100.  Spatial  Average: 
64  x  64.  Mass  Density  Fraction=l/5.  Data  presented  at  50,000  time  steps. 
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Figure  8:  Momentum  and  vorticity  map  of  two-dimensional  shear  instability  on  the  CAM- 
8.  Lattice  size  of  4096  x  2048  with  toroidal  boundary  conditions.  Spacetime  averaging  over 
128x128  blocks  for  50  time  steps.  FHP  collisions  with  spectators  and  a  rest  particle.  Data 
presented  at  time  steps  0,  10000,  and  30000  with  Galilean  velocity  shift. 
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Figure  9:  FHP-II  CM-5  Experiment:  Von  Karman  Streets  Lattice  Size:  4096  x  2048.  Time 
Average:  None.  Spatial  Average:  128x128.  Mass  Density  Fraction=l/7.  Data  presented  at 
32,000  time  steps. 


Figure  10:  Performance  runs  on  a  256-node  CM-5  for  an  FHP  hexagonal  lattice  embedded 
into  a  3D  mesh.  Performance  significantly  suffers  by  communication  overhead  for  small  lattice 
sizes. 
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